-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix parsing integer batch size within export #1004
base: gh/vmoens/18/base
Are you sure you want to change the base?
Conversation
ghstack-source-id: 73e7dd429770e1c383b3b2a1c28dbbf661d65f07 Pull Request resolved: #1004
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 90.3460μs | 25.6517μs | 38.9838 KOps/s | 42.0286 KOps/s | |
test_plain_set_stack_nested | 81.3820μs | 25.6494μs | 38.9873 KOps/s | 40.9080 KOps/s | |
test_plain_set_nested_inplace | 78.8670μs | 28.2397μs | 35.4111 KOps/s | 37.8495 KOps/s | |
test_plain_set_stack_nested_inplace | 67.5450μs | 28.2379μs | 35.4134 KOps/s | 37.8340 KOps/s | |
test_items | 24.2650μs | 4.3290μs | 230.9977 KOps/s | 240.9439 KOps/s | |
test_items_nested | 0.5422ms | 0.3811ms | 2.6240 KOps/s | 2.6290 KOps/s | |
test_items_nested_locked | 0.5778ms | 0.3827ms | 2.6129 KOps/s | 2.6256 KOps/s | |
test_items_nested_leaf | 0.1465ms | 80.0650μs | 12.4899 KOps/s | 12.4847 KOps/s | |
test_items_stack_nested | 0.4875ms | 0.3882ms | 2.5757 KOps/s | 2.6263 KOps/s | |
test_items_stack_nested_leaf | 0.1505ms | 85.2998μs | 11.7234 KOps/s | 12.0747 KOps/s | |
test_items_stack_nested_locked | 0.5917ms | 0.3906ms | 2.5600 KOps/s | 2.5923 KOps/s | |
test_keys | 51.1150μs | 3.5094μs | 284.9465 KOps/s | 286.7259 KOps/s | |
test_keys_nested | 0.2206ms | 0.1342ms | 7.4539 KOps/s | 7.4895 KOps/s | |
test_keys_nested_locked | 0.7522ms | 0.1395ms | 7.1692 KOps/s | 7.2292 KOps/s | |
test_keys_nested_leaf | 0.1981ms | 0.1168ms | 8.5584 KOps/s | 8.6360 KOps/s | |
test_keys_stack_nested | 0.3068ms | 0.1338ms | 7.4720 KOps/s | 7.4816 KOps/s | |
test_keys_stack_nested_leaf | 0.2151ms | 0.1163ms | 8.5995 KOps/s | 8.8809 KOps/s | |
test_keys_stack_nested_locked | 0.2260ms | 0.1395ms | 7.1677 KOps/s | 7.3793 KOps/s | |
test_values | 14.8114μs | 1.0402μs | 961.3325 KOps/s | 920.2927 KOps/s | |
test_values_nested | 0.1428ms | 92.8479μs | 10.7703 KOps/s | 10.7425 KOps/s | |
test_values_nested_locked | 0.1466ms | 92.9801μs | 10.7550 KOps/s | 10.7332 KOps/s | |
test_values_nested_leaf | 0.1367ms | 79.1348μs | 12.6367 KOps/s | 12.7436 KOps/s | |
test_values_stack_nested | 0.1686ms | 93.2474μs | 10.7242 KOps/s | 10.1606 KOps/s | |
test_values_stack_nested_leaf | 0.1246ms | 78.6717μs | 12.7111 KOps/s | 13.1709 KOps/s | |
test_values_stack_nested_locked | 0.1513ms | 93.1508μs | 10.7353 KOps/s | 10.7000 KOps/s | |
test_membership | 25.2780μs | 0.8864μs | 1.1282 MOps/s | 1.4073 MOps/s | |
test_membership_nested | 29.3550μs | 2.8027μs | 356.8007 KOps/s | 359.7703 KOps/s | |
test_membership_nested_leaf | 25.1870μs | 2.7935μs | 357.9785 KOps/s | 365.9989 KOps/s | |
test_membership_stacked_nested | 22.3420μs | 2.7841μs | 359.1782 KOps/s | 367.9815 KOps/s | |
test_membership_stacked_nested_leaf | 49.9830μs | 2.7978μs | 357.4186 KOps/s | 366.3923 KOps/s | |
test_membership_nested_last | 32.4500μs | 4.2370μs | 236.0150 KOps/s | 241.1513 KOps/s | |
test_membership_nested_leaf_last | 45.8250μs | 4.2763μs | 233.8492 KOps/s | 240.6598 KOps/s | |
test_membership_stacked_nested_last | 31.8490μs | 4.9849μs | 200.6049 KOps/s | 84.5008 KOps/s | |
test_membership_stacked_nested_leaf_last | 33.3420μs | 5.0441μs | 198.2495 KOps/s | 84.4283 KOps/s | |
test_nested_getleaf | 65.7520μs | 10.7630μs | 92.9108 KOps/s | 96.1892 KOps/s | |
test_nested_get | 43.0290μs | 9.9605μs | 100.3969 KOps/s | 100.6425 KOps/s | |
test_stacked_getleaf | 42.5590μs | 10.5680μs | 94.6251 KOps/s | 95.5818 KOps/s | |
test_stacked_get | 76.7630μs | 9.9376μs | 100.6282 KOps/s | 100.0912 KOps/s | |
test_nested_getitemleaf | 43.9220μs | 11.0345μs | 90.6252 KOps/s | 91.3562 KOps/s | |
test_nested_getitem | 39.9950μs | 10.2259μs | 97.7913 KOps/s | 97.2067 KOps/s | |
test_stacked_getitemleaf | 53.0590μs | 11.0677μs | 90.3526 KOps/s | 91.2083 KOps/s | |
test_stacked_getitem | 46.2160μs | 10.1876μs | 98.1587 KOps/s | 97.7682 KOps/s | |
test_lock_nested | 1.4056ms | 0.5140ms | 1.9457 KOps/s | 1.9695 KOps/s | |
test_lock_stack_nested | 0.8064ms | 0.4832ms | 2.0696 KOps/s | 2.1758 KOps/s | |
test_unlock_nested | 0.1144s | 0.5461ms | 1.8313 KOps/s | 2.3585 KOps/s | |
test_unlock_stack_nested | 0.7236ms | 0.3988ms | 2.5074 KOps/s | 2.6514 KOps/s | |
test_flatten_speed | 0.2139ms | 0.1032ms | 9.6896 KOps/s | 10.0849 KOps/s | |
test_unflatten_speed | 0.6890ms | 0.5155ms | 1.9400 KOps/s | 1.9742 KOps/s | |
test_common_ops | 2.1192ms | 1.2013ms | 832.4249 Ops/s | 868.4444 Ops/s | |
test_creation | 32.9710μs | 2.1587μs | 463.2498 KOps/s | 479.3299 KOps/s | |
test_creation_empty | 68.7080μs | 20.9910μs | 47.6394 KOps/s | 51.5555 KOps/s | |
test_creation_nested_1 | 99.9760μs | 24.5746μs | 40.6924 KOps/s | 44.7465 KOps/s | |
test_creation_nested_2 | 66.8050μs | 28.8482μs | 34.6642 KOps/s | 37.7520 KOps/s | |
test_clone | 0.1566ms | 17.5888μs | 56.8543 KOps/s | 57.4949 KOps/s | |
test_getitem[int] | 1.0820ms | 17.1075μs | 58.4537 KOps/s | 59.7480 KOps/s | |
test_getitem[slice_int] | 0.1477ms | 31.4669μs | 31.7794 KOps/s | 32.6219 KOps/s | |
test_getitem[range] | 0.2332ms | 58.3961μs | 17.1244 KOps/s | 17.1216 KOps/s | |
test_getitem[tuple] | 0.1522ms | 25.2077μs | 39.6705 KOps/s | 39.8945 KOps/s | |
test_getitem[list] | 0.3560ms | 53.3721μs | 18.7364 KOps/s | 18.8335 KOps/s | |
test_setitem_dim[int] | 94.0750μs | 34.6007μs | 28.9011 KOps/s | 29.7894 KOps/s | |
test_setitem_dim[slice_int] | 0.1094ms | 61.2428μs | 16.3285 KOps/s | 15.9856 KOps/s | |
test_setitem_dim[range] | 0.1448ms | 85.9348μs | 11.6367 KOps/s | 11.4770 KOps/s | |
test_setitem_dim[tuple] | 0.1352ms | 51.4119μs | 19.4507 KOps/s | 19.7706 KOps/s | |
test_setitem | 0.2105ms | 32.7452μs | 30.5388 KOps/s | 32.2678 KOps/s | |
test_set | 0.1795ms | 32.1109μs | 31.1421 KOps/s | 33.8446 KOps/s | |
test_set_shared | 3.5917ms | 0.2264ms | 4.4161 KOps/s | 4.4667 KOps/s | |
test_update | 0.2021ms | 41.5079μs | 24.0918 KOps/s | 25.7393 KOps/s | |
test_update_nested | 0.2004ms | 53.4803μs | 18.6985 KOps/s | 19.7518 KOps/s | |
test_update__nested | 0.9018ms | 45.8381μs | 21.8159 KOps/s | 22.1145 KOps/s | |
test_set_nested | 0.1837ms | 34.4815μs | 29.0011 KOps/s | 30.4805 KOps/s | |
test_set_nested_new | 0.2078ms | 39.7404μs | 25.1633 KOps/s | 26.3058 KOps/s | |
test_select | 0.3030ms | 58.3827μs | 17.1284 KOps/s | 17.7053 KOps/s | |
test_select_nested | 0.1600ms | 61.1680μs | 16.3484 KOps/s | 16.8708 KOps/s | |
test_exclude_nested | 0.1357ms | 75.8400μs | 13.1856 KOps/s | 13.4525 KOps/s | |
test_empty[True] | 1.0163ms | 0.3571ms | 2.8002 KOps/s | 2.7306 KOps/s | |
test_empty[False] | 8.5158μs | 1.2659μs | 789.9477 KOps/s | 804.8738 KOps/s | |
test_unbind_speed | 0.4107ms | 0.3108ms | 3.2170 KOps/s | 3.2572 KOps/s | |
test_unbind_speed_stack0 | 0.6474ms | 0.3038ms | 3.2914 KOps/s | 3.4075 KOps/s | |
test_unbind_speed_stack1 | 0.1218s | 0.7693ms | 1.2999 KOps/s | 1.3612 KOps/s | |
test_split | 0.1195s | 2.4983ms | 400.2720 Ops/s | 452.8289 Ops/s | |
test_chunk | 2.2206ms | 2.0208ms | 494.8617 Ops/s | 450.0858 Ops/s | |
test_creation[device0] | 0.2757ms | 0.1183ms | 8.4503 KOps/s | 8.4579 KOps/s | |
test_creation_from_tensor | 3.9548ms | 0.1204ms | 8.3030 KOps/s | 8.5199 KOps/s | |
test_add_one[memmap_tensor0] | 0.4583ms | 7.4563μs | 134.1143 KOps/s | 138.1089 KOps/s | |
test_contiguous[memmap_tensor0] | 17.4730μs | 1.9325μs | 517.4749 KOps/s | 538.7640 KOps/s | |
test_stack[memmap_tensor0] | 0.1012ms | 5.7658μs | 173.4376 KOps/s | 174.4006 KOps/s | |
test_memmaptd_index | 0.1201s | 0.5803ms | 1.7231 KOps/s | 2.4246 KOps/s | |
test_memmaptd_index_astensor | 1.1676ms | 0.5138ms | 1.9463 KOps/s | 1.9479 KOps/s | |
test_memmaptd_index_op | 1.8566ms | 1.1258ms | 888.2426 Ops/s | 929.2522 Ops/s | |
test_serialize_model | 0.1341s | 0.1247s | 8.0180 Ops/s | 8.4068 Ops/s | |
test_serialize_model_pickle | 0.4463s | 0.3948s | 2.5329 Ops/s | 2.4925 Ops/s | |
test_serialize_weights | 0.1273s | 0.1210s | 8.2632 Ops/s | 7.3210 Ops/s | |
test_serialize_weights_returnearly | 0.1722s | 0.1646s | 6.0765 Ops/s | 6.2442 Ops/s | |
test_serialize_weights_pickle | 0.5519s | 0.4274s | 2.3398 Ops/s | 1.1854 Ops/s | |
test_serialize_weights_filesystem | 0.1556s | 0.1457s | 6.8620 Ops/s | 7.1037 Ops/s | |
test_serialize_model_filesystem | 0.1581s | 0.1461s | 6.8449 Ops/s | 6.5542 Ops/s | |
test_reshape_pytree | 80.2790μs | 38.9142μs | 25.6976 KOps/s | 25.8098 KOps/s | |
test_reshape_td | 0.1025ms | 46.5027μs | 21.5041 KOps/s | 21.4612 KOps/s | |
test_view_pytree | 94.8960μs | 38.9343μs | 25.6843 KOps/s | 25.8903 KOps/s | |
test_view_td | 0.1125ms | 51.8703μs | 19.2788 KOps/s | 19.1849 KOps/s | |
test_unbind_pytree | 0.1212ms | 36.8564μs | 27.1323 KOps/s | 27.9528 KOps/s | |
test_unbind_td | 0.4339ms | 45.6671μs | 21.8976 KOps/s | 22.2974 KOps/s | |
test_split_pytree | 99.3340μs | 38.1371μs | 26.2212 KOps/s | 26.0564 KOps/s | |
test_split_td | 0.2258ms | 57.6999μs | 17.3310 KOps/s | 17.3825 KOps/s | |
test_add_pytree | 0.1141ms | 45.4140μs | 22.0196 KOps/s | 21.4058 KOps/s | |
test_add_td | 0.2285ms | 89.7861μs | 11.1376 KOps/s | 11.0365 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1394ms | 73.2231μs | 13.6569 KOps/s | 13.6497 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.4277ms | 0.2024ms | 4.9412 KOps/s | 4.8086 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1851ms | 54.8569μs | 18.2292 KOps/s | 18.0265 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2610ms | 0.1462ms | 6.8406 KOps/s | 6.7055 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 75.4810μs | 28.4837μs | 35.1078 KOps/s | 35.9707 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1460ms | 76.9121μs | 13.0019 KOps/s | 13.0541 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1434ms | 79.2714μs | 12.6149 KOps/s | 13.0686 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1264ms | 67.9471μs | 14.7173 KOps/s | 15.1727 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.2763ms | 0.1252ms | 7.9855 KOps/s | 7.9830 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 1.8540ms | 0.2496ms | 4.0071 KOps/s | 4.0449 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1391ms | 55.4630μs | 18.0300 KOps/s | 18.3146 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.7200ms | 82.0697μs | 12.1848 KOps/s | 12.4255 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.1893ms | 0.1128ms | 8.8623 KOps/s | 8.9123 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.4141ms | 0.3024ms | 3.3069 KOps/s | 3.2888 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.5239ms | 0.2803ms | 3.5676 KOps/s | 3.5987 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2641ms | 0.1286ms | 7.7732 KOps/s | 8.1619 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1872ms | 76.9241μs | 12.9998 KOps/s | 13.3826 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1350ms | 56.3059μs | 17.7601 KOps/s | 18.4762 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.4250ms | 0.2474ms | 4.0423 KOps/s | 4.0704 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.2067ms | 0.1124ms | 8.8971 KOps/s | 8.8604 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 92.0110μs | 30.6715μs | 32.6036 KOps/s | 32.1579 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1673ms | 79.0016μs | 12.6580 KOps/s | 13.0809 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1839ms | 81.5744μs | 12.2587 KOps/s | 12.7514 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1287ms | 69.2851μs | 14.4331 KOps/s | 14.7734 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.3752ms | 0.2171ms | 4.6068 KOps/s | 4.7323 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.2585ms | 1.7847ms | 560.3131 Ops/s | 548.8401 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.3255ms | 0.2161ms | 4.6268 KOps/s | 4.7404 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.4638ms | 1.1698ms | 854.8741 Ops/s | 839.8493 Ops/s | |
test_compile_assign_and_add_stack[compile] | 0.7429ms | 0.4748ms | 2.1061 KOps/s | 2.0952 KOps/s | |
test_compile_assign_and_add_stack[eager] | 5.2091ms | 4.5034ms | 222.0526 Ops/s | 223.7739 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1244ms | 44.9175μs | 22.2631 KOps/s | 22.0339 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.7076ms | 51.3017μs | 19.4925 KOps/s | 19.7090 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1202ms | 38.3678μs | 26.0635 KOps/s | 26.4737 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.1117ms | 29.7152μs | 33.6528 KOps/s | 32.4275 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1405ms | 40.7210μs | 24.5573 KOps/s | 25.4876 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 87.7630μs | 29.5026μs | 33.8953 KOps/s | 32.6765 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1820ms | 77.2662μs | 12.9423 KOps/s | 12.6952 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.5888ms | 29.1077μs | 34.3552 KOps/s | 33.8986 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1364ms | 71.2777μs | 14.0296 KOps/s | 14.0623 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 93.9750μs | 24.2466μs | 41.2429 KOps/s | 41.7551 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1663ms | 72.3309μs | 13.8253 KOps/s | 13.9329 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 82.3330μs | 23.9355μs | 41.7790 KOps/s | 41.7304 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1601ms | 78.4741μs | 12.7431 KOps/s | 12.6846 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 1.0904ms | 29.3294μs | 34.0955 KOps/s | 34.4901 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1736ms | 71.5271μs | 13.9807 KOps/s | 13.9225 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 78.2750μs | 23.9660μs | 41.7258 KOps/s | 41.3067 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1811ms | 73.0057μs | 13.6976 KOps/s | 13.9814 KOps/s | |
test_compile_indexing[int-pytree-eager] | 94.5260μs | 24.1807μs | 41.3553 KOps/s | 41.7953 KOps/s | |
test_mod_add[eager] | 0.1048ms | 27.8822μs | 35.8652 KOps/s | 35.5649 KOps/s | |
test_mod_add[compile] | 0.1708ms | 45.0059μs | 22.2193 KOps/s | 21.5018 KOps/s | |
test_mod_add[compile-overhead] | 0.1552ms | 44.6396μs | 22.4017 KOps/s | 22.1260 KOps/s | |
test_mod_wrap[eager] | 0.4412ms | 0.2235ms | 4.4749 KOps/s | 4.4651 KOps/s | |
test_mod_wrap[compile] | 2.0367ms | 0.2094ms | 4.7746 KOps/s | 4.7724 KOps/s | |
test_mod_wrap[compile-overhead] | 2.0054ms | 0.2104ms | 4.7530 KOps/s | 4.7762 KOps/s | |
test_mod_wrap_and_backward[eager] | 13.3795ms | 11.1765ms | 89.4731 Ops/s | 89.6904 Ops/s | |
test_mod_wrap_and_backward[compile] | 13.6151ms | 11.1170ms | 89.9524 Ops/s | 88.2365 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 12.8713ms | 11.0729ms | 90.3104 Ops/s | 88.5381 Ops/s | |
test_seq_add[eager] | 0.2313ms | 95.9548μs | 10.4216 KOps/s | 10.2137 KOps/s | |
test_seq_add[compile] | 0.1332ms | 59.9248μs | 16.6876 KOps/s | 16.5399 KOps/s | |
test_seq_add[compile-overhead] | 0.1609ms | 58.0194μs | 17.2356 KOps/s | 16.8262 KOps/s | |
test_seq_wrap[eager] | 0.7534ms | 0.4034ms | 2.4788 KOps/s | 2.4508 KOps/s | |
test_seq_wrap[compile] | 0.3665ms | 0.2298ms | 4.3519 KOps/s | 4.3287 KOps/s | |
test_seq_wrap[compile-overhead] | 0.5373ms | 0.2314ms | 4.3220 KOps/s | 4.3711 KOps/s | |
test_func_call_runtime[False-eager] | 0.8547ms | 0.5591ms | 1.7886 KOps/s | 1.7673 KOps/s | |
test_func_call_runtime[False-compile] | 0.6200ms | 0.4387ms | 2.2796 KOps/s | 2.2828 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.6120ms | 0.4378ms | 2.2842 KOps/s | 2.2927 KOps/s | |
test_func_call_runtime[True-eager] | 1.2822ms | 0.7763ms | 1.2881 KOps/s | 1.2709 KOps/s | |
test_func_call_runtime[True-compile] | 0.6252ms | 0.4777ms | 2.0933 KOps/s | 2.0986 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6233ms | 0.4796ms | 2.0851 KOps/s | 2.1020 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.7936ms | 0.5619ms | 1.7797 KOps/s | 1.7703 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.6477ms | 0.4370ms | 2.2885 KOps/s | 2.2709 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.6754ms | 0.4380ms | 2.2832 KOps/s | 2.2917 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.5028ms | 0.9340ms | 1.0707 KOps/s | 1.0678 KOps/s | |
test_func_call_cm_runtime[True-compile] | 0.7260ms | 0.5096ms | 1.9624 KOps/s | 1.9853 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.7344ms | 0.5054ms | 1.9787 KOps/s | 1.9760 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.7046ms | 1.9863ms | 503.4567 Ops/s | 496.8840 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9268ms | 0.5494ms | 1.8203 KOps/s | 1.8412 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.9215ms | 0.5360ms | 1.8657 KOps/s | 1.8614 KOps/s | |
test_distributed | 0.3414ms | 0.1298ms | 7.7058 KOps/s | 7.3921 KOps/s | |
test_tdmodule | 54.6320μs | 20.0235μs | 49.9414 KOps/s | 51.4084 KOps/s | |
test_tdmodule_dispatch | 67.4660μs | 40.5997μs | 24.6307 KOps/s | 26.0339 KOps/s | |
test_tdseq | 41.6670μs | 22.9679μs | 43.5390 KOps/s | 41.6432 KOps/s | |
test_tdseq_dispatch | 86.9020μs | 46.1447μs | 21.6710 KOps/s | 22.6654 KOps/s | |
test_instantiation_functorch | 2.1346ms | 1.5573ms | 642.1185 Ops/s | 642.0934 Ops/s | |
test_exec_functorch | 0.3113ms | 0.1827ms | 5.4731 KOps/s | 5.5143 KOps/s | |
test_exec_functional_call | 0.3095ms | 0.1771ms | 5.6461 KOps/s | 5.6653 KOps/s | |
test_exec_td_decorator | 0.6201ms | 0.2412ms | 4.1462 KOps/s | 4.1012 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.0497ms | 0.6763ms | 1.4786 KOps/s | 1.4599 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0127ms | 0.6649ms | 1.5041 KOps/s | 1.4965 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.8611ms | 0.5476ms | 1.8261 KOps/s | 1.7914 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7513ms | 0.5484ms | 1.8234 KOps/s | 1.8146 KOps/s | |
test_to_module_speed[True] | 2.2438ms | 1.3871ms | 720.9304 Ops/s | 717.4964 Ops/s | |
test_to_module_speed[False] | 2.0127ms | 1.3528ms | 739.2044 Ops/s | 740.3845 Ops/s | |
test_tc_init | 0.1201ms | 49.7626μs | 20.0954 KOps/s | 21.3957 KOps/s | |
test_tc_init_nested | 0.1788ms | 97.5372μs | 10.2525 KOps/s | 11.0621 KOps/s | |
test_tc_first_layer_tensor | 55.3190μs | 1.5534μs | 643.7690 KOps/s | 660.9454 KOps/s | |
test_tc_first_layer_nontensor | 21.4100μs | 4.9098μs | 203.6759 KOps/s | 205.5285 KOps/s | |
test_tc_second_layer_tensor | 37.5300μs | 2.9673μs | 337.0111 KOps/s | 361.0589 KOps/s | |
test_tc_second_layer_nontensor | 40.5250μs | 6.2076μs | 161.0933 KOps/s | 164.8450 KOps/s | |
test_unbind | 0.2898s | 16.7845ms | 59.5788 Ops/s | 71.2530 Ops/s | |
test_full_like | 20.3237ms | 13.9758ms | 71.5521 Ops/s | 107.5348 Ops/s | |
test_zeros_like | 6.3961ms | 4.5115ms | 221.6538 Ops/s | 256.4473 Ops/s | |
test_ones_like | 6.2114ms | 4.6735ms | 213.9729 Ops/s | 140.7847 Ops/s | |
test_clone | 10.5636ms | 7.0856ms | 141.1303 Ops/s | 109.9305 Ops/s | |
test_squeeze | 83.6460μs | 13.1647μs | 75.9608 KOps/s | 81.1872 KOps/s | |
test_unsqueeze | 0.2880ms | 94.8437μs | 10.5437 KOps/s | 10.5053 KOps/s | |
test_split | 0.4181ms | 0.1977ms | 5.0579 KOps/s | 4.9502 KOps/s | |
test_permute | 0.4413ms | 0.2312ms | 4.3243 KOps/s | 4.4136 KOps/s | |
test_stack | 41.5252ms | 31.3724ms | 31.8752 Ops/s | 35.8903 Ops/s | |
test_cat | 35.9056ms | 29.8498ms | 33.5011 Ops/s | 34.8183 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.1398ms | 14.0510μs | 71.1694 KOps/s | 71.6220 KOps/s | |
test_plain_set_stack_nested | 40.8610μs | 14.0714μs | 71.0659 KOps/s | 70.5660 KOps/s | |
test_plain_set_nested_inplace | 44.4510μs | 14.9467μs | 66.9044 KOps/s | 66.6646 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1871ms | 14.9939μs | 66.6940 KOps/s | 66.6082 KOps/s | |
test_items | 29.3310μs | 2.8550μs | 350.2578 KOps/s | 347.6205 KOps/s | |
test_items_nested | 0.3789ms | 0.3287ms | 3.0421 KOps/s | 3.0848 KOps/s | |
test_items_nested_locked | 0.3891ms | 0.3311ms | 3.0205 KOps/s | 3.0425 KOps/s | |
test_items_nested_leaf | 77.6720μs | 55.6077μs | 17.9831 KOps/s | 17.8809 KOps/s | |
test_items_stack_nested | 0.3918ms | 0.3327ms | 3.0060 KOps/s | 3.0146 KOps/s | |
test_items_stack_nested_leaf | 86.0220μs | 56.5979μs | 17.6685 KOps/s | 17.4597 KOps/s | |
test_items_stack_nested_locked | 0.3873ms | 0.3337ms | 2.9963 KOps/s | 3.0367 KOps/s | |
test_keys | 37.5410μs | 3.4019μs | 293.9538 KOps/s | 274.8335 KOps/s | |
test_keys_nested | 96.9030μs | 55.8942μs | 17.8910 KOps/s | 17.6889 KOps/s | |
test_keys_nested_locked | 2.5347ms | 62.1416μs | 16.0923 KOps/s | 16.1348 KOps/s | |
test_keys_nested_leaf | 74.1420μs | 46.9139μs | 21.3157 KOps/s | 21.3000 KOps/s | |
test_keys_stack_nested | 84.8720μs | 56.7365μs | 17.6254 KOps/s | 17.6302 KOps/s | |
test_keys_stack_nested_leaf | 74.2420μs | 46.9471μs | 21.3006 KOps/s | 20.5544 KOps/s | |
test_keys_stack_nested_locked | 0.1166ms | 61.4624μs | 16.2701 KOps/s | 16.1109 KOps/s | |
test_values | 5.4752μs | 0.8714μs | 1.1476 MOps/s | 1.1753 MOps/s | |
test_values_nested | 72.4920μs | 40.4113μs | 24.7456 KOps/s | 24.2922 KOps/s | |
test_values_nested_locked | 70.5510μs | 42.2952μs | 23.6434 KOps/s | 23.3334 KOps/s | |
test_values_nested_leaf | 67.9020μs | 34.9982μs | 28.5729 KOps/s | 28.0646 KOps/s | |
test_values_stack_nested | 78.5910μs | 41.3561μs | 24.1802 KOps/s | 23.8290 KOps/s | |
test_values_stack_nested_leaf | 71.7220μs | 35.8991μs | 27.8559 KOps/s | 27.6089 KOps/s | |
test_values_stack_nested_locked | 85.0520μs | 43.0954μs | 23.2044 KOps/s | 22.8338 KOps/s | |
test_membership | 1.5476μs | 0.5040μs | 1.9842 MOps/s | 1.9828 MOps/s | |
test_membership_nested | 19.1605μs | 1.9089μs | 523.8555 KOps/s | 530.4487 KOps/s | |
test_membership_nested_leaf | 13.4055μs | 1.8915μs | 528.6671 KOps/s | 531.3800 KOps/s | |
test_membership_stacked_nested | 29.7810μs | 1.9695μs | 507.7383 KOps/s | 522.3695 KOps/s | |
test_membership_stacked_nested_leaf | 32.6010μs | 1.9825μs | 504.4088 KOps/s | 516.6656 KOps/s | |
test_membership_nested_last | 38.4010μs | 2.8505μs | 350.8172 KOps/s | 351.9531 KOps/s | |
test_membership_nested_leaf_last | 26.1300μs | 2.8229μs | 354.2396 KOps/s | 355.6954 KOps/s | |
test_membership_stacked_nested_last | 29.0310μs | 3.1879μs | 313.6906 KOps/s | 234.5753 KOps/s | |
test_membership_stacked_nested_leaf_last | 29.9410μs | 3.2153μs | 311.0106 KOps/s | 237.3361 KOps/s | |
test_nested_getleaf | 35.0010μs | 6.1846μs | 161.6922 KOps/s | 161.9744 KOps/s | |
test_nested_get | 27.3600μs | 5.7342μs | 174.3936 KOps/s | 172.6007 KOps/s | |
test_stacked_getleaf | 35.0400μs | 6.0353μs | 165.6916 KOps/s | 164.5749 KOps/s | |
test_stacked_get | 33.0910μs | 5.6195μs | 177.9531 KOps/s | 174.2211 KOps/s | |
test_nested_getitemleaf | 33.8610μs | 6.1483μs | 162.6457 KOps/s | 161.1655 KOps/s | |
test_nested_getitem | 33.0800μs | 5.7548μs | 173.7666 KOps/s | 172.0980 KOps/s | |
test_stacked_getitemleaf | 37.6710μs | 6.0543μs | 165.1723 KOps/s | 163.3039 KOps/s | |
test_stacked_getitem | 33.9910μs | 5.7794μs | 173.0291 KOps/s | 173.8919 KOps/s | |
test_lock_nested | 5.0900ms | 0.4207ms | 2.3771 KOps/s | 2.3530 KOps/s | |
test_lock_stack_nested | 0.4354ms | 0.3843ms | 2.6023 KOps/s | 2.6160 KOps/s | |
test_unlock_nested | 0.7607ms | 0.3583ms | 2.7913 KOps/s | 2.7656 KOps/s | |
test_unlock_stack_nested | 0.3725ms | 0.3240ms | 3.0863 KOps/s | 3.1061 KOps/s | |
test_flatten_speed | 0.1495ms | 69.6921μs | 14.3488 KOps/s | 14.2443 KOps/s | |
test_unflatten_speed | 0.3385ms | 0.2808ms | 3.5614 KOps/s | 3.4071 KOps/s | |
test_common_ops | 1.5521ms | 1.2773ms | 782.9121 Ops/s | 731.5360 Ops/s | |
test_creation | 33.5810μs | 1.4832μs | 674.1959 KOps/s | 667.4252 KOps/s | |
test_creation_empty | 45.7410μs | 15.5172μs | 64.4445 KOps/s | 65.0973 KOps/s | |
test_creation_nested_1 | 46.3510μs | 17.3372μs | 57.6794 KOps/s | 57.3238 KOps/s | |
test_creation_nested_2 | 65.6110μs | 19.8274μs | 50.4352 KOps/s | 49.8204 KOps/s | |
test_clone | 59.8920μs | 29.6163μs | 33.7652 KOps/s | 34.1279 KOps/s | |
test_getitem[int] | 1.3547ms | 16.2531μs | 61.5269 KOps/s | 56.8697 KOps/s | |
test_getitem[slice_int] | 0.1198ms | 27.6207μs | 36.2047 KOps/s | 32.4668 KOps/s | |
test_getitem[range] | 0.2343ms | 0.1131ms | 8.8418 KOps/s | 8.8031 KOps/s | |
test_getitem[tuple] | 0.1205ms | 23.6230μs | 42.3316 KOps/s | 40.7698 KOps/s | |
test_getitem[list] | 0.2026ms | 0.1022ms | 9.7876 KOps/s | 9.2848 KOps/s | |
test_setitem_dim[int] | 70.6020μs | 46.3912μs | 21.5558 KOps/s | 19.4015 KOps/s | |
test_setitem_dim[slice_int] | 97.1420μs | 69.5090μs | 14.3866 KOps/s | 14.2352 KOps/s | |
test_setitem_dim[range] | 0.1595ms | 0.1307ms | 7.6490 KOps/s | 7.5977 KOps/s | |
test_setitem_dim[tuple] | 0.1034ms | 63.2289μs | 15.8156 KOps/s | 15.7832 KOps/s | |
test_setitem | 84.7130μs | 42.5314μs | 23.5120 KOps/s | 23.7585 KOps/s | |
test_set | 0.1153ms | 41.6547μs | 24.0069 KOps/s | 24.1831 KOps/s | |
test_set_shared | 0.3733ms | 52.0617μs | 19.2080 KOps/s | 19.3457 KOps/s | |
test_update | 0.3018ms | 50.0804μs | 19.9679 KOps/s | 19.8293 KOps/s | |
test_update_nested | 0.1190ms | 57.2637μs | 17.4631 KOps/s | 17.5418 KOps/s | |
test_update__nested | 0.1036ms | 60.4565μs | 16.5408 KOps/s | 16.6118 KOps/s | |
test_set_nested | 0.1019ms | 44.0056μs | 22.7244 KOps/s | 22.6947 KOps/s | |
test_set_nested_new | 0.1119ms | 47.5709μs | 21.0212 KOps/s | 21.3297 KOps/s | |
test_select | 0.1103ms | 61.1058μs | 16.3651 KOps/s | 16.2440 KOps/s | |
test_select_nested | 82.4820μs | 42.0248μs | 23.7955 KOps/s | 23.5172 KOps/s | |
test_exclude_nested | 0.1022ms | 58.8460μs | 16.9935 KOps/s | 16.8691 KOps/s | |
test_empty[True] | 0.2960ms | 0.2412ms | 4.1465 KOps/s | 4.0987 KOps/s | |
test_empty[False] | 4.1951μs | 0.7357μs | 1.3593 MOps/s | 1.3490 MOps/s | |
test_to | 71.3820μs | 24.8164μs | 40.2960 KOps/s | 38.6131 KOps/s | |
test_to_nonblocking | 61.9120μs | 24.1171μs | 41.4643 KOps/s | 39.3731 KOps/s | |
test_unbind_speed | 0.3135ms | 0.2823ms | 3.5428 KOps/s | 3.5121 KOps/s | |
test_unbind_speed_stack0 | 0.3618ms | 0.2816ms | 3.5516 KOps/s | 3.5456 KOps/s | |
test_unbind_speed_stack1 | 93.3092ms | 0.7086ms | 1.4113 KOps/s | 1.5315 KOps/s | |
test_split | 95.4190ms | 2.1751ms | 459.7553 Ops/s | 436.5296 Ops/s | |
test_chunk | 95.2672ms | 2.1528ms | 464.5174 Ops/s | 428.9869 Ops/s | |
test_creation[device0] | 0.2907ms | 0.1267ms | 7.8901 KOps/s | 7.5570 KOps/s | |
test_creation_from_tensor | 0.3594ms | 0.1303ms | 7.6749 KOps/s | 7.4000 KOps/s | |
test_add_one[memmap_tensor0] | 0.2198ms | 8.9249μs | 112.0467 KOps/s | 106.3867 KOps/s | |
test_contiguous[memmap_tensor0] | 33.0310μs | 2.2021μs | 454.1068 KOps/s | 447.1720 KOps/s | |
test_stack[memmap_tensor0] | 51.4410μs | 6.8241μs | 146.5391 KOps/s | 142.9476 KOps/s | |
test_memmaptd_index | 1.1631ms | 0.4293ms | 2.3293 KOps/s | 2.2891 KOps/s | |
test_memmaptd_index_astensor | 0.7244ms | 0.4791ms | 2.0874 KOps/s | 2.0080 KOps/s | |
test_memmaptd_index_op | 1.4160ms | 1.0275ms | 973.2404 Ops/s | 912.6920 Ops/s | |
test_serialize_model | 0.1316s | 0.1299s | 7.6977 Ops/s | 7.6824 Ops/s | |
test_serialize_model_pickle | 1.3515s | 1.2121s | 0.8250 Ops/s | 0.8228 Ops/s | |
test_serialize_weights | 0.2253s | 0.1426s | 7.0132 Ops/s | 7.0324 Ops/s | |
test_serialize_weights_returnearly | 0.2336s | 56.9592ms | 17.5564 Ops/s | 17.6422 Ops/s | |
test_serialize_weights_pickle | 1.3718s | 1.2164s | 0.8221 Ops/s | 0.8217 Ops/s | |
test_reshape_pytree | 63.6120μs | 35.8947μs | 27.8593 KOps/s | 27.4861 KOps/s | |
test_reshape_td | 74.9420μs | 42.1493μs | 23.7252 KOps/s | 23.3973 KOps/s | |
test_view_pytree | 66.3510μs | 35.4150μs | 28.2367 KOps/s | 27.5089 KOps/s | |
test_view_td | 85.0620μs | 46.0091μs | 21.7348 KOps/s | 20.8806 KOps/s | |
test_unbind_pytree | 63.9920μs | 35.0768μs | 28.5089 KOps/s | 27.9981 KOps/s | |
test_unbind_td | 0.5109ms | 43.7185μs | 22.8736 KOps/s | 22.9630 KOps/s | |
test_split_pytree | 0.5287ms | 47.0563μs | 21.2511 KOps/s | 21.3861 KOps/s | |
test_split_td | 0.1476ms | 55.9662μs | 17.8679 KOps/s | 17.5668 KOps/s | |
test_add_pytree | 0.1001ms | 57.7554μs | 17.3144 KOps/s | 17.5550 KOps/s | |
test_add_td | 0.1640ms | 96.5171μs | 10.3609 KOps/s | 11.0197 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.4282ms | 0.2127ms | 4.7012 KOps/s | 4.6114 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.1979ms | 0.1514ms | 6.6038 KOps/s | 6.6746 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1830ms | 0.1453ms | 6.8841 KOps/s | 6.8742 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2527ms | 0.1855ms | 5.3896 KOps/s | 5.4415 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 50.8910μs | 21.9777μs | 45.5008 KOps/s | 43.5984 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 90.6420μs | 44.2572μs | 22.5952 KOps/s | 22.5008 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.2377ms | 63.1173μs | 15.8435 KOps/s | 15.6026 KOps/s | |
test_compile_copy_nested[pytree-eager] | 86.7320μs | 49.0089μs | 20.4045 KOps/s | 20.4098 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.3861ms | 0.3217ms | 3.1088 KOps/s | 3.1230 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.2824ms | 0.2099ms | 4.7632 KOps/s | 4.7392 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1843ms | 0.1287ms | 7.7682 KOps/s | 7.6369 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1101ms | 59.8019μs | 16.7219 KOps/s | 15.7429 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.3951ms | 0.3221ms | 3.1045 KOps/s | 3.1058 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6945ms | 0.6423ms | 1.5570 KOps/s | 1.6058 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.2947ms | 0.2476ms | 4.0386 KOps/s | 4.0024 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.3825ms | 0.3248ms | 3.0785 KOps/s | 3.0800 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1159ms | 69.4929μs | 14.3900 KOps/s | 13.7655 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1734ms | 0.1308ms | 7.6478 KOps/s | 7.4615 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6038ms | 0.5336ms | 1.8741 KOps/s | 1.8880 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.3989ms | 0.3222ms | 3.1035 KOps/s | 3.1115 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 67.5010μs | 18.5081μs | 54.0304 KOps/s | 55.1974 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 64.4020μs | 26.7970μs | 37.3176 KOps/s | 37.1119 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1107ms | 69.4912μs | 14.3903 KOps/s | 14.5702 KOps/s | |
test_compile_copy_flat[pytree-eager] | 79.6920μs | 51.6724μs | 19.3527 KOps/s | 19.5388 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 2.3169ms | 0.8121ms | 1.2314 KOps/s | 1.1100 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.4347ms | 3.2951ms | 303.4788 Ops/s | 300.5918 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 2.3125ms | 0.8151ms | 1.2269 KOps/s | 1.1244 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.5630ms | 3.3262ms | 300.6429 Ops/s | 304.4343 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1528ms | 0.1093ms | 9.1467 KOps/s | 8.8319 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.1952ms | 65.8807μs | 15.1790 KOps/s | 15.0117 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1496ms | 0.1034ms | 9.6672 KOps/s | 9.5485 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.1467ms | 44.2961μs | 22.5754 KOps/s | 22.2333 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1588ms | 0.1086ms | 9.2080 KOps/s | 9.3104 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 92.9220μs | 44.2527μs | 22.5975 KOps/s | 22.4984 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1989ms | 0.1379ms | 7.2541 KOps/s | 7.1707 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1634ms | 25.4665μs | 39.2673 KOps/s | 38.1116 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1672ms | 0.1318ms | 7.5883 KOps/s | 7.3971 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 56.6620μs | 20.3289μs | 49.1910 KOps/s | 46.8778 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1838ms | 0.1331ms | 7.5104 KOps/s | 7.2176 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 56.7810μs | 20.4175μs | 48.9777 KOps/s | 47.1847 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1812ms | 0.1394ms | 7.1743 KOps/s | 7.1279 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.4911ms | 24.5580μs | 40.7199 KOps/s | 38.7132 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1966ms | 0.1340ms | 7.4601 KOps/s | 7.3090 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.1541ms | 22.5983μs | 44.2511 KOps/s | 46.9272 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1854ms | 0.1338ms | 7.4711 KOps/s | 7.4841 KOps/s | |
test_compile_indexing[int-pytree-eager] | 65.9520μs | 20.5969μs | 48.5509 KOps/s | 47.6069 KOps/s | |
test_mod_add[eager] | 81.6420μs | 32.0422μs | 31.2088 KOps/s | 30.4546 KOps/s | |
test_mod_add[compile] | 0.3827ms | 69.8231μs | 14.3219 KOps/s | 13.9641 KOps/s | |
test_mod_add[compile-overhead] | 0.2627ms | 0.1364ms | 7.3301 KOps/s | 7.0108 KOps/s | |
test_mod_wrap[eager] | 0.3235ms | 0.2443ms | 4.0935 KOps/s | 4.0007 KOps/s | |
test_mod_wrap[compile] | 1.4681ms | 0.2998ms | 3.3359 KOps/s | 3.1661 KOps/s | |
test_mod_wrap[compile-overhead] | 7.6595ms | 4.0040ms | 249.7505 Ops/s | 248.9984 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.4577ms | 1.3667ms | 731.7052 Ops/s | 687.6753 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.5795ms | 1.3348ms | 749.1638 Ops/s | 686.2619 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3432ms | 0.9067ms | 1.1029 KOps/s | 971.2357 Ops/s | |
test_seq_add[eager] | 0.1498ms | 97.6527μs | 10.2404 KOps/s | 10.1878 KOps/s | |
test_seq_add[compile] | 0.1477ms | 81.0903μs | 12.3319 KOps/s | 12.1919 KOps/s | |
test_seq_add[compile-overhead] | 0.1535ms | 0.1148ms | 8.7102 KOps/s | 8.5528 KOps/s | |
test_seq_wrap[eager] | 0.4456ms | 0.3875ms | 2.5808 KOps/s | 2.5402 KOps/s | |
test_seq_wrap[compile] | 0.3812ms | 0.3176ms | 3.1487 KOps/s | 3.1004 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3023ms | 0.2229ms | 4.4871 KOps/s | 4.4311 KOps/s | |
test_func_call_runtime[False-eager] | 0.8167ms | 0.7386ms | 1.3540 KOps/s | 1.3303 KOps/s | |
test_func_call_runtime[False-compile] | 0.8794ms | 0.7999ms | 1.2502 KOps/s | 1.2299 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4139ms | 0.3626ms | 2.7579 KOps/s | 2.7281 KOps/s | |
test_func_call_runtime[True-eager] | 0.9725ms | 0.9013ms | 1.1095 KOps/s | 1.0722 KOps/s | |
test_func_call_runtime[True-compile] | 0.9312ms | 0.8344ms | 1.1985 KOps/s | 1.1780 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4542ms | 0.3984ms | 2.5100 KOps/s | 2.4984 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8102ms | 0.7407ms | 1.3501 KOps/s | 1.2517 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.9490ms | 0.8051ms | 1.2421 KOps/s | 1.2227 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4387ms | 0.3664ms | 2.7295 KOps/s | 2.7347 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1212ms | 1.0030ms | 996.9759 Ops/s | 983.8462 Ops/s | |
test_func_call_cm_runtime[True-compile] | 0.9491ms | 0.8624ms | 1.1595 KOps/s | 1.1391 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.4832ms | 0.4234ms | 2.3617 KOps/s | 2.3428 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5686ms | 2.0924ms | 477.9122 Ops/s | 475.5572 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9772ms | 0.8818ms | 1.1341 KOps/s | 1.1198 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.4791ms | 0.4309ms | 2.3205 KOps/s | 2.3269 KOps/s | |
test_distributed | 2.2133ms | 0.2002ms | 4.9944 KOps/s | 8.9291 KOps/s | |
test_tdmodule | 80.4520μs | 15.0300μs | 66.5335 KOps/s | 63.5575 KOps/s | |
test_tdmodule_dispatch | 57.8110μs | 28.7745μs | 34.7530 KOps/s | 34.6011 KOps/s | |
test_tdseq | 42.6210μs | 16.0971μs | 62.1231 KOps/s | 63.1077 KOps/s | |
test_tdseq_dispatch | 56.8020μs | 32.5273μs | 30.7434 KOps/s | 31.2041 KOps/s | |
test_instantiation_functorch | 2.4227ms | 1.8886ms | 529.5004 Ops/s | 522.7627 Ops/s | |
test_instantiation_td | 1.7868ms | 1.2015ms | 832.2859 Ops/s | 826.2625 Ops/s | |
test_exec_functorch | 0.2819ms | 0.2080ms | 4.8078 KOps/s | 4.6742 KOps/s | |
test_exec_functional_call | 0.2703ms | 0.2120ms | 4.7172 KOps/s | 4.6576 KOps/s | |
test_exec_td | 0.2799ms | 0.2180ms | 4.5862 KOps/s | 4.5472 KOps/s | |
test_exec_td_decorator | 0.6798ms | 0.2584ms | 3.8697 KOps/s | 3.7960 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.7645ms | 0.6906ms | 1.4479 KOps/s | 1.4324 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7468ms | 0.6868ms | 1.4561 KOps/s | 1.4434 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.7086ms | 0.5804ms | 1.7230 KOps/s | 1.6704 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6687ms | 0.6078ms | 1.6451 KOps/s | 1.7065 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.4322ms | 0.6822ms | 1.4659 KOps/s | 1.4666 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8429ms | 0.6807ms | 1.4691 KOps/s | 1.4720 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7100ms | 0.6085ms | 1.6434 KOps/s | 1.6749 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7492ms | 0.6256ms | 1.5985 KOps/s | 1.6477 KOps/s | |
test_vmap_transformer_speed[True-True] | 8.8495ms | 8.4518ms | 118.3179 Ops/s | 117.7615 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.9342ms | 8.4537ms | 118.2908 Ops/s | 117.7776 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.4434ms | 8.1908ms | 122.0881 Ops/s | 120.7464 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.3043ms | 8.1979ms | 121.9827 Ops/s | 119.8967 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.8267ms | 19.7100ms | 50.7356 Ops/s | 50.6794 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 20.7671ms | 19.8264ms | 50.4379 Ops/s | 50.1700 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 20.7505ms | 19.6091ms | 50.9968 Ops/s | 51.3185 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.6557ms | 19.5184ms | 51.2338 Ops/s | 51.1055 Ops/s | |
test_to_module_speed[True] | 1.2098ms | 0.9383ms | 1.0657 KOps/s | 1.0593 KOps/s | |
test_to_module_speed[False] | 1.3441ms | 0.9228ms | 1.0837 KOps/s | 1.0953 KOps/s | |
test_tc_init | 62.3120μs | 32.5688μs | 30.7042 KOps/s | 30.8415 KOps/s | |
test_tc_init_nested | 0.1038ms | 66.6339μs | 15.0074 KOps/s | 15.5366 KOps/s | |
test_tc_first_layer_tensor | 5.3887μs | 0.6797μs | 1.4713 MOps/s | 1.4640 MOps/s | |
test_tc_first_layer_nontensor | 33.0610μs | 2.2435μs | 445.7403 KOps/s | 441.3346 KOps/s | |
test_tc_second_layer_tensor | 47.2713μs | 1.3843μs | 722.3918 KOps/s | 730.4920 KOps/s | |
test_tc_second_layer_nontensor | 31.7110μs | 2.9376μs | 340.4139 KOps/s | 341.8278 KOps/s | |
test_unbind | 0.1956s | 12.2958ms | 81.3286 Ops/s | 90.4173 Ops/s | |
test_full_like | 0.6570ms | 0.5756ms | 1.7373 KOps/s | 1.7427 KOps/s | |
test_zeros_like | 0.2836ms | 0.1980ms | 5.0506 KOps/s | 5.0494 KOps/s | |
test_ones_like | 0.2333ms | 0.1979ms | 5.0529 KOps/s | 5.0547 KOps/s | |
test_clone | 0.4779ms | 0.4149ms | 2.4102 KOps/s | 2.4117 KOps/s | |
test_squeeze | 38.1210μs | 9.8297μs | 101.7323 KOps/s | 99.6491 KOps/s | |
test_unsqueeze | 0.2800ms | 75.0819μs | 13.3188 KOps/s | 13.1423 KOps/s | |
test_split | 0.2596ms | 0.1534ms | 6.5206 KOps/s | 6.3078 KOps/s | |
test_permute | 0.2385ms | 0.1743ms | 5.7369 KOps/s | 5.5181 KOps/s | |
test_stack | 1.2546ms | 0.8439ms | 1.1850 KOps/s | 1.1658 KOps/s | |
test_cat | 1.2476ms | 1.2314ms | 812.0726 Ops/s | 811.7995 Ops/s |
ghstack-source-id: 18a5798c5377d3e5b65e7b6c87d59917c474fd64 Pull Request resolved: #1004
x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100) | ||
export_test = export_mod(x_new, y_new) | ||
eager_test = test(x_new, y_new) | ||
assert eager_test.batch_size == export_test.batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang this test fails when using dynamic shape - the eager shape is [5]
but the export is []
.
Both across strict=False
and True
.
The batch size [s0]
becomes []
when using dynamic shapes and when the 2nd output shape mismatches the 1st.
We do get a warning though
W0920 10:19:28.564000 20340 torch/fx/experimental/symbolic_shapes.py:5136] Ignored guard Eq(s0, 5) == False, this could result in accuracy problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there's something a bit nontrivial going on here. In torch.compile eager, if we produce a fresh TensorDict and that TensorDict holds a list of dynamic ints, then in the residual bytecode we have to construct the TensorDict and also put in the freshly computed dynamic shapes from the FX graph (that has some int outputs now). So actually building a TensorDict isn't just a matter of putting in the right tensors, you also have to put some ints in too. Does this work?
Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.
If you want to workaround, perhaps batch size can store rank instead of size and lazily compute it from tensor if it's not set? Better to fix things though. Just not sure what you expect to work and not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.
TensorDict is pytreeable but you can deactivate it, this is what the comment is about (don't do it or the test will fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what works and what doesn't
class Test(torch.nn.Module):
def forward(self, x: torch.Tensor, y: torch.Tensor):
return TensorDict(
{
"x": x,
"y": y,
},
batch_size=x.shape[0],
)
x, y = torch.zeros(5, 100), torch.zeros(5, 100)
result = torch.export.export(test, args=(x, y), strict=False, dynamic_shapes={
"x": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
"y": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
})
result = torch.export.export(test, args=(x, y), strict=False, **kwargs)
export_mod = result.module()
x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100)
export_test = export_mod(x_new, y_new)
eager_test = test(x_new, y_new)
assert torch.Size([5]) == eager_test.batch_size == export_test.batch_size # Works because x and x_new have the same shape
x_new, y_new = torch.zeros(2, 100), torch.zeros(2, 100)
export_test = export_mod(x_new, y_new)
eager_test = test(x_new, y_new)
assert torch.Size([2]) == eager_test.batch_size == export_test.batch_size # Fails! now export_test.batch_size is torch.Size([])
So it's a weird behaviour, the SymInt just vanished into thin air in the second case
ghstack-source-id: ffd60b71e6e9424b81eeabee77fb8710589f6cae Pull Request resolved: #1004
Stack from ghstack (oldest at bottom):